The intention of this data audit is to find any records from the Neotoma Paleoecology Database which potentially violate Neotoma’s statement of values, especially with respect to Neotoma’s goal of aligning with principles of Indigenous data sovereignty. The authors of this audit are white researchers informed by guidance from theorists of Indigenous data sovereignty, paleo scientists, and other colleagues.
Note: the following audit contains potentially sensitive information about Indigenous ancestors. Our intent is to expose this information in order to work toward better management in the future.
We did a spatial join for every site in Neotoma with a unique site ID to shapefiles of the borders of federal Indigenous in the United States and Canada, and Indigenous protected areas in Australia, and we tallied and mapped all those which intersected the borders of federal reservations. See list below.

Our next steps are…
We checked for all sites whether both the latitude and longitude were exactly divisible by 0.25, or any sites where latitude or the longitude has a precision of 0 or 1 decimal places. If they were, we said they were fuzzed. Notice that this is a conservative method. There are likely fuzzed sites in Neotoma whose coordinates are not exactly divisible by 0.25, or that have precision greater than 1 decimal place. We found 4697 such fuzzed sites. One table below documents their siteids and names, the next table counts datasets by the type of dataset and the constituent database from which the dataset derives, and the map below documents their locations.

Our next steps are to refine our definition of fuzzed sites and to speak with constituent database stewards.
We downloaded Neotoma’s taxa table and selected any taxon IDs which might describe people.(Taxon ID 6359 is Primates, and 6171 is Mammalia.)
Then we used a Neotoma API to search for any occurrences of those taxon IDs.
The two maps below show the sites they come from, and the table documents what information there is about those samples from the samples table in Neotoma. Rows colored red are sensitivity level 1 because they come from North America. Rows colored orange are sensitivity level 2 because they come from elsewhere.
It should be noted that lead FAUNMAP steward Jessica Blois has removed all sample-level Homo sapiens occurrences from public access as the Database works on a policy for managing these data.

The table below counts sample records by sensitivity and constituent database.
Our next steps are to reach out to the lead stewards for the Faunal Isotope Dtabase and PaVeLa, so they can come to a decision about managing these human records in their databases.
We searched through two fields (notes and materialdated) from Neotoma’s geochronology table for any occurrences of words from the dictionary below.
Any rows from the geochronology table which contained one of the above words is listed in the table below. Notice that not all of these radiocarbon dates is necessarily problematic, only potentially. Further scrutiny may be needed. (We also checked against CARD’s list of radiocarbon dates deriving from human ancestors that are duplicated in Neotoma, and there was agreement between the two lists: all 60 of CARD’s records that are also in Neotoma are in the below table.)
We assigned sensitivity categories as follows: any references to human bone were assigned sensitivity level 1. Any references to human feces were assigned sensitivity level 2. References to human graves or burials also merited a 2. All other items were given sensitivity level 3. All publications linked to records in which the material dated was taxon-ambiguous bone collagen were consulted. We found that geochron ID 21255 definitely derives from a human, and geochron IDs 29333, 29334, and 29335 likely derive from humans. These records were therefore categorized as sensitivity level 1.
Below the color-coded table, we count records by their sensitivity and the constituent database of which they are a part.
Reach out to stewards of relevant constituent databases.
We used the same dictionary from the last query to search through two fields in Neotoma’s collection units table (location and notes). Any collection units that returned one of the above words is reproduced below. The records were individually scrutinized categorized subjectively into sensitivity categories.
Lastly, we counted the number of records by their constituent database and by their sensitivity. Notice that the count here is greater than the total number of collectionunits because constituent databases are linked to datasets, not collection units, and multiple datasets can derive from a single collection unit. (We did exclude the Neotoma datasettype “geochronologic”.)
Need to actually scrutinize these records.